UPC’s Bilingual N-gram Translation System

نویسندگان

  • José B. Mariño
  • Rafael E. Banchs
  • Josep M. Crego
  • Adrià de Gispert
  • Patrik Lambert
  • José A. R. Fonollosa
  • Marta R. Costa-jussà
  • Maxim Khalilov
چکیده

This paper describes the UPC’s bilingual n-gram approach to statistical machine translation, which implements the log-linear combination of a bilingual n-gram translation model with six additional feature functions. A brief description of the complete system is presented and special attention is devoted to the novel features and reordering strategies that have been recently implemented. Translation results for the Spanish-to-English and English-to-Spanish tasks considered during the TC-STAR’s second evaluation campaign are presented and discussed. Finally, improvements achieved in translation accuracy with respect to the previous year’s system are also evaluated and discussed

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wider Context by Using Bilingual Language Models in Machine Translation

In past Evaluations for Machine Translation of European Languages, it could be shown that the translation performance of SMT systems can be increased by integrating a bilingual language model into a phrase-based SMT system. In the bilingual language model, target words with their aligned source words build the tokens of an n-gram based language model. We analyzed the effect of bilingual languag...

متن کامل

Statistical Machine Translation of Euparl Data by using Bilingual N-grams

This work discusses translation results for the four Euparl data sets which were made available for the shared task “Exploiting Parallel Texts for Statistical Machine Translation”. All results presented were generated by using a statistical machine translation system which implements a log-linear combination of feature functions along with a bilingual n-gram translation model.

متن کامل

Ncode: an Open Source Bilingual N-gram SMT Toolkit

This paper describes N, an open source statistical machine translation (SMT) toolkit for translation models estimated as n-gram language models of bilingual units (tuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review...

متن کامل

Smooth Bilingual N-Gram Translation

We address the problem of smoothing translation probabilities in a bilingual N-grambased statistical machine translation system. It is proposed to project the bilingual tuples onto a continuous space and to estimate the translation probabilities in this representation. A neural network is used to perform the projection and the probability estimation. Smoothing probabilities is most important fo...

متن کامل

A Comparative Study on Translation Units for Bilingual Lexicon Extraction

This paper presents on-going research on automatic extraction of bilingual lexicon from English-Japanese parallel corpora. The main objective of this paper is to examine various Ngram models of generating translation units for bilingual lexicon extraction. Three N-gram models, a baseline model (Bound-length N-gram) and two new models (Chunk-bound Ngram and Dependency-linked N-gram) are compared...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006